Recognition of off-line printed Arabic text using Hidden Markov Models

نویسندگان

  • Husni Al-Muhtaseb
  • Sabri A. Mahmoud
  • Rami Qahwaji
چکیده

Using Hidden Markov Models Husni A. Al-Muhtaseb, Sabri A. Mahmoud, Information and Computer Science Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia. e-mail: [email protected], [email protected]. and Rami S. Qahwaji Electronic Imaging and media communications department, University of Bradford, Bradford, UK e-mail: [email protected] Abstract This paper describes a technique for automatic recognition of off-line printed Arabic text using Hidden Markov Models. In this work different sizes of overlapping and non-overlapping hierarchical windows are used to generate 16 features from each vertical sliding strip. Eight different Arabic fonts were used for testing (viz. Arial, Tahoma, Akhbar, Thuluth, Naskh, Simplified Arabic, Andalus, and Traditional Arabic). It was experimentally proven that different fonts have their highest recognition rates at different numbers of states (5 or 7) and codebook sizes (128 or 256). Arabic text is cursive, and each character may have up to 4 different shapes based on its location in a word. This research work considered each shape as a different class resulting in a total of 126 classes (compared to 28 Arabic letters). The achieved average recognition rates were between 98.08% and 99.89% for the eight experimental fonts. The main contributions of this work are the novel hierarchical sliding window technique using only 16 features for each sliding window, considering each shape of Arabic characters as a separate class, bypassing the need for segmenting Arabic text, and its applicability to other languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

An Empirical Evaluation of Off-line Arabic Handwriting And Printed Characters Recognition System

Handwriting recognition is a challenging task for many real-world applications such as document authentication, form processing, historical documents. This paper focuses on the comparative study on off-line handwriting recognition system and Printed Characters by taking Arabic handwriting. The off-line Handwriting Recognition methods for Arabic words which being often used among then across the...

متن کامل

Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models

In this paper, we present multi-font printed Arabic text recognition using hidden Markov models (HMMs). We propose a novel approach to the sliding window technique for feature extraction. The size and position of the cells of the sliding window adapt to the writing line of Arabic text and ink-pixel distributions. We employ a two-step approach for mixed-font text recognition, in which the input ...

متن کامل

Arabic Printed Word Recognition Using Windowed Bernoulli HMMs

Hidden Markov Models (HMMs) are now widely used for off-line text recognition in many languages and, in particular, Arabic. In previous work, we proposed to directly use columns of raw, binary image pixels, which are directly fed into embedded Bernoulli (mixture) HMMs, that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The idea was to by-pass feature...

متن کامل

Modèles de Markov Cachés et Modèle de Longueur pour la Reconnaissance de l’Ecriture Arabe à Basse Résolution

We present a system for automatic recognition of printed Arabic text in open vocabulary mode, low resolution. This system is based on Hidden Markov Models. Such models have shown to be particularly successful when it comes to solve the double problem of segmenting and recognizing signals corresponding to sequences of different states, such as recognition of speech or cursive writing. The specif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Signal Processing

دوره 88  شماره 

صفحات  -

تاریخ انتشار 2008